31 research outputs found

    Associative learning on imbalanced environments: An empirical study

    Get PDF
    Associative memories have emerged as a powerful computational neural network model for several pattern classification problems. Like most traditional classifiers, these models assume that the classes share similar prior probabilities. However, in many real-life applications the ratios of prior probabilities between classes are extremely skewed. Although the literature has provided numerous studies that examine the performance degradation of renowned classifiers on different imbalanced scenarios, so far this effect has not been supported by a thorough empirical study in the context of associative memories. In this paper, we fix our attention on the applicability of the associative neural networks to the classification of imbalanced data. The key questions here addressed are whether these models perform better, the same or worse than other popular classifiers, how the level of imbalance affects their performance, and whether distinct resampling strategies produce a different impact on the associative memories. In order to answer these questions and gain further insight into the feasibility and efficiency of the associative memories, a large-scale experimental evaluation with 31 databases, seven classification models and four resampling algorithms is carried out here, along with a non-parametric statistical test to discover any significant differences between each pair of classifiers.This work has partially been supported by the Mexican Science and Technology Council (CONACYT-Mexico) through the Postdoctoral Fellowship Program (232167), the Mexican PRODEP(DSA/103.5/15/7004), the Spanish Ministry of Economy(TIN2013-46522-P) and the Generalitat Valenciana (PROMETEOII/2014/062)

    Back propagation with balanced MSE cost Function and nearest neighbor editing for handling class overlap and class imbalance

    Get PDF
    The class imbalance problem has been considered a critical factor for designing and constructing the supervised classifiers. In the case of artificial neural networks, this complexity negatively affects the generalization process on under-represented classes. However, it has also been observed that the decrease in the performance attainable of standard learners is not directly caused by the class imbalance, but is also related with other difficulties, such as overlapping. In this work, a new empirical study for handling class overlap and class imbalance on multi-class problem is described. In order to solve this problem, we propose the joint use of editing techniques and a modified MSE cost function for MLP. This analysis was made on a remote sensing data . The experimental results demonstrate the consistency and validity of the combined strategy here proposedPartially supported by the Spanish Ministry of Education and Science under grants CSD2007–00018, TIN2009–14205–C04–04, and by Fundació Caixa Castelló–Bancaixa under grants P1–1B2009–04 and P1–1B2009–45; SDMAIA-010 of the TESJO and 2933/2010 from the UAE

    Realidad virtual y entornos virtuales como apoyo al acercamiento universidad-comunidad: el caso de la Facultad de Ingeniería de la UAEMex|

    Get PDF
    Las instituciones de educación superior, en su mayoría, cuentan con sitios web mediante los cuales dan a conocer información relevante acerca de las diferentes carreras o servicios que ofrecen. Sin embargo, comúnmente la información mostrada es insuficiente, situación que podría provocar desinterés por parte del usuario al visitar el sitio y encontrarse con información limitada para sus necesidades. Para este fin, en la Facultad de Ingeniería de la Universidad Autónoma del Estado de México (UAEM) se desarrolló un sistema de realidad virtual que permite a la comunidad en general conocer los sitios más representativos de la facultad. Con la implementación de este sitio se busca que los visitantes virtuales tengan conocimiento previo tanto de las áreas como de los servicios que ofrece, y lograr con ello un mayor acercamiento de la comunidad universitaria inscrita y público en general al quehacer diario universitario

    Tratamiento del desbalance en problemas con múltiples clases con ECOC

    Get PDF
    El problema del desbalance de clases puede producir un deterioro importante en la efectividad del clasificador, en particular con los patrones de las clases menos representadas. El desbalance en el conjunto de entrenamiento (CE) significa que una clase es representada por una gran cantidad de patrones mientras que otra es representada por muy pocos. Los estudios existentes se encuentran orientados principalmente a tratar problemas de dos clases, no obstante, un importante número de problemas reales se encuentran representados por múltiples clases, donde resulta más difícil su discriminación para el clasificador. El éxito de la Mezcla de Expertos (ME) se basa en el criterio de divide y vencerás. En su funcionamiento general, el problema es dividido en fragmentos más pequeños que serán estudiados por separado. De este modo, el modelo general es poco influenciado por las dificultades individuales de sus componentes. La idea principal del estudio aquí mostrado, es construir una Mezcla de expertos cuyos miembros serán entrenados en una parte del problema general y de este modo, mejorar el rendimiento del clasificador en el contexto de múltiples clases. Para este fin, se hace uso de los métodos conocidos como Error-correcting output codes (ECOC), que permiten realizar una codificación en parejas de clases el problema de estudio. Resultados experimentales sobre conjuntos de datos reales, muestran la viabilidad de la estrategia aquí propuesta

    A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios

    Get PDF
    Class imbalance and class overlap are two of the major problems in data mining and machine learning. Several studies have shown that these data complexities may affect the performance or behavior of artificial neural networks. Strategies proposed to face with both challenges have been separately applied. In this paper, we introduce a hybrid method for handling both class imbalance and class overlap simultaneously in multi-class learning problems. Experimental results on five remote sensing data show that the combined approach is a promising method

    A New Under-Sampling Method to Face Class Overlap and Imbalance

    Get PDF
    Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases

    Equilibrating the recognition of the minority Class in the imbalance context

    Get PDF
    In pattern recognition, it is well known that the classifier performance depends on the classification rule and the complexities presented in the data sets (such as class overlapping, class imbalance, outliers, high-dimensional data sets among others). In this way, the issue of class imbalance is exhibited when one class is less represented with respect to the other classes. If the classifier is trained with imbalanced data sets, the natural tendency is to recognize the samples included in the majority class, ignoring the minority classes. This situation is not desirable because in real problems it is necessary to recognize the minority class more without sacrificing the precision of the majority class. In this work we analyze the behaviour of four classifiers taking into a count a relative balance among the accuracy classes

    Identification of Latent Topics in Patients Surviving COVID-19 in Mexico

    Get PDF
    With the outbreak of the SARS-CoV-2 o COVID-19 pandemic, multiple studies of risk factors and their influence on patient deaths have been developed. However, little attention is often paid to analyzing patients in risk groups despite the fact that they have been infected and inpatients can survive. In this article, with the dataset available from the Ministery of the health of Mexico, this paper proposes the use of the latent topic extraction algorithm Latent Dirichlet Allocation (LDA) for the study of COVID-19 survival factors in Mexico. The results let us conclude that in the year before strategies for prevention and control of COVID-19, the latent topics support that patients without comorbidities have a low risk of death, compared with the period of 2021, wherein in spite of having some risk factors patients can survive

    Empirical Study of the Associative Approach in the Context of Classification Problems

    Get PDF
    Research carried out by the scientific community has shown that the performance of the classifiers depends not only on the learning rule, if not also on the complexities inherent in the data sets. Some traditional classifiers have been commonly used in the context of classification problems (three Neural Networks, C4.5, SVM, among others). However, the associative approach has been further explored in the recovery context, than in the classification task, and its performance almost has not been analyzed when several complexities in the data are presented. The present investigation analyzes the performance of the associative approach (CHA, CHAT and original Alpha Beta) when three classification problems occur (class imbalance, overlapping and a typical patterns). The results show that the CHAT algorithm recognizes the minority class better than the rest of the classifiers in the context of class imbalance. However, the CHA model ignores the minority class in most cases. In addition, the CHAT algorithm requires well-defined decision boundaries when Wilson’s method is applied, because of its performance increases. Also, it was noted that when a balance between the rates is emphasized, the performance of the three classifiers increase (RB, RFBR and CHAT). The original Alfa Beta model shows poor performance when pre-processing the data is done. The performance of the classifiers increases significantly when the SMOTE method is applied, which does not occur without a pre-processing or with a subsampling, in the context of the imbalance of the classes.Investigaciones realizadas por la comunidad científica han evidenciado que el rendimiento de los clasificadores, no solamente depende de la regla de aprendizaje, sino también de las complejidades inherentes en los conjuntos de datos. Algunos clasificadores se han utilizado habitualmente en el contexto de losproblemas de clasificación (tres Redes neuronales, C4.5, SVM, entre otros). No obstante, el enfoque asociativo se ha explorado más en en el ámbito de recuperación, que en la tarea de clasificación, y su rendimiento se ha analizado escasamente cuando se presentan varias complejidades en los datos. La presente investigación analiza el rendimiento del enfoque asociativo (CHA, CHAT y Alfa Beta original) cuando se presentan tres problemas de clasificación (desequilibrio de las clases, solapamiento y patrones atípicos). Los resultados evidencian que el CHAT reconoce mejor la clase minoritaria en comparación con el resto de los clasificadores en el contexto del desequilibrio de las clases. Sin embargo, el modelo CHA ignora la clase minoritaria en la mayoría de los casos. Además, el modelo CHAT exhibe la necesidad de requerir de fronteras de decisión bien definidas cuando se aplica el método de Wilson, ya que su rendimiento se incrementa. También, se notó que cuando se enfatiza un equilibrio entre las tasas, el rendimiento de tres clasificadores incrementa (CHAT, RB y RFBR). El modelo Alfa beta original sigue mostrando un desempeño pobre cuando se realiza el pre-procesamiento en los datos. El rendimiento de los clasificadores incrementa significativamente al aplicarse el método SMOTE, situación que no se presenta sin un pre-procesamiento o submuestreo, en el contexto del desequilibrio de las clases

    Técnicas de submuestreo, Toma de decisiones y Análisis de diversidad en aprendisaje supervisado con Sistemas Múltiples de Clasificación

    No full text
    En la presente Tesis Doctoral, se analiza fundamentalmente la aplicabilidad de los Sistemas de Múltiple Clasificación (SMC) en el marco de la regla del vecino más cercano. Una primera línea fundamental de investigación se centra en los algoritmos de preprocesado, con el objetivo de resolver diferentes problemas relacionados con la calidad de la muestra de entrenamiento: presencia de patrones redundantes, atípicos o ruidosos, bases de datos con un tamaño excesivo y desbalance entre las distribuciones de las clases. Otro aspecto de gran relevancia hace referencia a la efectividad de los componentes individuales del SMC dentro del método de votación, para lo cual se proponen nuevas técnicas de ponderación dinámica y estática de las decisiones individuales. El tercer punto central se refiere al análisis de diversidad de los clasificadores, utilizando para ello diversas medidas existentes en la literatura afín. Otras cuestiones ampliamente analizadas a lo largo de esta tesis son: las técnicas de muestreo (bagging, boosting, arcing y selección secuencial aleatoria), el tamaño del SMC y, por último, la viabilidad de utilizar dos modelos de redes neuronales artificiales (perceptrón multicapa y red modular)
    corecore